fix(x402): classify transaction_held as TRANSACTION_PENDING (closes #93)#94
fix(x402): classify transaction_held as TRANSACTION_PENDING (closes #93)#94tfireubs-ui wants to merge 1 commit intoaibtcdev:mainfrom
Conversation
…ibtcdev#93) The relay's /settle endpoint returns errorReason: "transaction_held" when a sender has a nonce gap and the relay queues the tx rather than dispatching it. This error reason was not matched by classifyPaymentError, causing it to fall through to the catch-all UNEXPECTED_SETTLE_ERROR (500 UNKNOWN_ERROR). Add a transaction_held case that maps to TRANSACTION_PENDING with a 30s retryAfter — longer than the 10s for transaction_pending since resolving a nonce gap takes more time. The client receives 402 with a clear retry signal rather than a confusing 500. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
arc0btc
left a comment
There was a problem hiding this comment.
Classifies transaction_held as TRANSACTION_PENDING with a 30s retry — exactly right.
What works well:
- Placement in the classifier is correct. At the main call site (line 409), the function receives
settleResult.errorReasonas the first arg, socombined="transaction_held transaction_held". The earliernoncecheck doesn't fire ("transaction_held" contains no "nonce"), and the new case is reached cleanly. - The 30s
retryAfteris well-calibrated. Relay nonce gap resolution is not instantaneous — we've seen gaps take 30–90s to clear in production. Callers with existing retry-on-TRANSACTION_PENDING logic get the right signal to wait longer than the normal 10s. - Both match forms (
transaction_heldandtransaction held) are consistent with every other case in the file — good pattern adherence. - The inline comment explains the relay behavior clearly for anyone debugging a future incident.
[nit] The space-separated form "transaction held" looks defensive rather than necessary — every other relay error code uses snake_case exclusively in its errorReason. If the relay API contract guarantees snake_case, removing it keeps the check tight. But keeping it is consistent with how all other cases are written, so either way is fine.
Operational note: We process x402 payments through this relay continuously. During relay nonce gap windows (which can persist for 30–90s), held transactions were falling through to 500 UNKNOWN_ERROR — giving callers no indication the failure was transient or retryable. This fix makes those windows transparent and automatically recoverable via existing retry logic.
|
Merge ping — APPROVED (arc0btc). Classifies transaction_held as TRANSACTION_PENDING (closes #93). |
|
Hi! I wanted to let you know that #107 supersedes the Once you've had a chance to look at the new PR and confirm, would you mind closing this one? No rush — just want to avoid confusion once the branch lands. Thanks for the work here, the problem framing was helpful context for the design. |
* docs(planning): phase 1 research notes for boring-tx adoption Map from x402-api compat-shim code paths to the native boring-tx events emitted by x402-sponsor-relay v1.30.1. Covers shim inventory, behavior comparison table, tx-schemas entry points, relay RPC response shapes, PaymentPollingDO public API sketch, and risk list with 8 identified risks. Co-Authored-By: Claude <noreply@anthropic.com> * chore(deps): bump @aibtc/tx-schemas for boring-tx state machine Bump @aibtc/tx-schemas constraint from ^0.3.0 to ^1.0.0. The 0.x.y semver locking meant the installed 1.0.0 package was outside the resolvable range; pinning to ^1.0.0 correctly tracks the native boring-tx state machine exports needed in Phases 3-5. Also remove stale patches/x402-stacks+2.0.1.patch — the upstream package now ships with the desired 120000ms HTTP client timeout natively, making the patch-package postinstall step error on every npm install. Baseline: npm run check (tsc --noEmit) exits 0, npm run deploy:dry-run builds clean at 1027.76 KiB before any logic changes. Co-Authored-By: Claude <noreply@anthropic.com> * refactor(payments): route payment types through @aibtc/tx-schemas Create src/services/payment-contract.ts as a thin re-export helper over @aibtc/tx-schemas subpaths, providing a single import point for all payment-lifecycle types used across middleware, utilities, and Durable Objects in this service. Replace SettlementResponseV2 from x402-stacks with SettleResult (aliased from HttpSettleResponse in tx-schemas) in middleware/x402.ts and types.ts. The discriminated union type is structurally compatible; a cast is applied at the verifier.settle() call site — Phase 5 removes the x402-stacks dependency entirely when the RPC path takes over. extractCanonicalPaymentDetails is preserved (Phase 5 removes it). Co-Authored-By: Claude <noreply@anthropic.com> * feat(payments): add PaymentPollingDO for checkStatusUrl polling Adds PaymentPollingDO (SQLite-backed Durable Object) that tracks in-flight payments by polling checkStatusUrl with exponential backoff until terminal. Key design decisions: - _fetchStatus() is the single relay-contact seam: Phase 4 uses HTTP GET, issue #87 swaps to env.X402_RELAY.checkPayment(paymentId) — one-line change - computeDerivedHints() extracted to src/utils/payment-hints.ts so it is testable with bun:test without requiring cloudflare:workers runtime - DO instances are namespaced by paymentId (one per in-flight payment) - Alarm backoff: 5s (polls 1-3), 15s (4-6), 60s (7+), 10-min timeout Wiring: - wrangler.jsonc: PAYMENT_POLLING_DO binding + v3 migration tag in all envs - src/types.ts: Env interface updated with PAYMENT_POLLING_DO binding - src/index.ts: exports class; mounts GET /payment-status/:paymentId (free) Unit tests (36 passing): - computeDerivedHints covers all terminal reason categories - Stub-based track → poll → terminal happy-path flow - derivedHints per category verified Co-Authored-By: Claude <noreply@anthropic.com> * feat(payments): emit native payment.* events, drop compat shim Middleware now mints a client-side paymentId ("pay_" + crypto.randomUUID()) before submitting to the relay, injects it as the payment-identifier idempotency input, extracts checkStatusUrl from the relay response extensions (with a fallback construction), and registers the payment with PaymentPollingDO.track() as fire-and-forget. Five native lifecycle events replace all compat-shim-era event names: payment.initiated — paymentId minted, about to submit to relay payment.pending — relay acknowledged but payment is still in-flight payment.confirmed — relay settled successfully payment.failed — relay rejected with a terminal failure reason payment.replaced — payment replaced by another tx (nonce race) Deleted entirely: - extractCanonicalPaymentDetails() and all internal shim helpers - inferLegacyStatus() and inferLegacyTerminalReason() - getRetryDecisionContext() (tests/_shared_utils.ts update deferred to Phase 7) - compat_shim_used log field from buildPaymentLogFields - compatShimUsed / source fields from CanonicalPaymentDetails and RetryDecisionContext - OpenAPI schema for details.canonical in 402 response body Unit tests updated to cover the three surviving classifier predicates and the revised instability derivation signature. Co-Authored-By: Claude <noreply@anthropic.com> * feat(payments): add retryable/retryAfter/nextSteps error hints All non-200 payment error responses now carry structured retry hints in both the JSON body and the payment-response header (base64 JSON): { retryable, nextSteps, retryAfter? } nextSteps tokens are stable identifiers tied to tx-schemas terminal reason categories — not free-form prose — so clients can branch without string-parsing: rebuild_and_resign — sender nonce issue, build fresh tx retry_later — transient relay/settlement error start_new_payment — identity lost/replaced, restart x402 flow fix_and_resend — invalid payload, fix before retrying wait_for_confirmation — confirmed, delivery should proceed classifyPaymentError() now checks canonical status (failed/replaced/ not_found) from the relay response before falling back to string heuristics, so boring-tx relay responses are classified accurately. Settlement failure path: computeDerivedHints() maps canonical status + terminalReason → hints with no DO round-trip. Exception path: hintsFromClassifiedCode() derives hints from classified error code when no canonical status is available. Updated llms-full.txt error handling section and /topics/payment-flow topic doc to document the new hint shape, token vocabulary, and client retry pattern. Co-Authored-By: Claude <noreply@anthropic.com> * test(payments): cover boring-tx lifecycle end-to-end Finish the pending tests/_shared_utils.ts diff (NonceTracker, nonce resets on retry, signPaymentWithNonce helper). Add getRetryDecisionContext and RetryDecisionContext to payment-status.ts — these were already imported in the committed shared utils but never implemented. Add X-PAYMENT-ID response header in the middleware success path so lifecycle tests can extract the relay paymentId from a 200 response without parsing the settlement result's opaque extensions field. Add tests/payment-polling-lifecycle.test.ts (runPaymentPollingLifecycle): - Makes a real x402 payment to /hashing/sha256 - Reads X-PAYMENT-ID from the 200 response header - Polls GET /payment-status/:paymentId (free DO route) until terminal - Asserts the DO snapshot has expected shape (paymentId, checkStatusUrl, polledAt, pollCount, terminal status) Register payment-polling in LIFECYCLE_RUNNERS in _run_all_tests.ts. All 65 payment-*.unit.test.ts pass. npm test (quick mode, 14 stateless endpoints) passes 14/14. npm run test:full requires X402_CLIENT_PK and a local worker (X402_WORKER_URL=http://localhost:8787) since X-PAYMENT-ID header is not yet in the deployed staging worker. Co-Authored-By: Claude <noreply@anthropic.com> * refactor(payments): simplify post-boring-tx adoption Remove three dead imports from x402.ts middleware (isInFlightPaymentState, isRelayRetryableTerminalReason, isSenderRebuildTerminalReason) that were added during Phase 5 but never used in the middleware body — the retry logic that would have used them lives in tests/_shared_utils.ts instead. Clean up two stale phase-reference comments in payment-contract.ts that referred to "Phase 5" as a future event ("will widen this surface", "removes the x402-stacks dependency") — those phases are now complete. TERMINAL_STATUSES duplication between payment-hints.ts and PaymentPollingDO.ts was reviewed and intentionally left — the isolation benefit outweighs the DRY concern for a 4-element constant across a DO boundary. Co-Authored-By: Claude <noreply@anthropic.com> * chore(planning): phase 9 verification log All blocking checks pass: npm run check (0 errors), npm run deploy:dry-run (clean, PAYMENT_POLLING_DO binding confirmed), npm test (14/14), bun unit tests (114/114). Branch already on origin/main tip — no rebase needed. Known non-blocker: test:full payment-polling-lifecycle is deployment-gated (X-PAYMENT-ID header not yet on live staging). Co-Authored-By: Claude <noreply@anthropic.com> * chore(planning): phase 10 PR handoff Record PR title, body, issue comment text, and all URLs for the boring-tx state machine PR (#107) and related issue comments on #94, #106, and #87. Co-Authored-By: Claude <noreply@anthropic.com> * chore: re-trigger Workers Builds after migration bootstrap --------- Co-authored-by: Claude <noreply@anthropic.com>
|
Closing — superseded by #107 which addresses transaction classification as part of the boring-tx state machine. The |
Problem
When the relay's
/settleendpoint returnserrorReason: "transaction_held"(relay accepted the tx but queued it due to a sender nonce gap),classifyPaymentError()doesn't match this error reason. It falls through to the catch-allUNEXPECTED_SETTLE_ERROR— which produces a 500 with"code": "UNKNOWN_ERROR". The caller has no indication this is retryable and no idea why it failed.Fix
Added a
transaction_heldcase toclassifyPaymentErrorthat maps toTRANSACTION_PENDINGwithretryAfter: 30:The 30s retryAfter is longer than
transaction_pending's 10s because nonce gap resolution takes more time (relay needs to fill or clear the gap before it can dispatch the held tx).After this fix, callers receive 402 +
Retry-After: 30with a clear message instead of 500UNKNOWN_ERROR. Their existing retry logic (which already handlesTRANSACTION_PENDING) handles this gracefully.Closes #93